README FILE

"Political Divisions in Large Cities:The Socio-Spatial Basis of Legislative Behavior in Chicago and Toronto"Journal of PoliticsZack Taylor, University of Western Ontario, zack.taylor@uwo.caDavid A. Armstrong II, University of Western Ontario, dave.armstrong@uwo.caOriginal submission: November 2, 2023
Updated: March 13, 2024

INSTRUCTIONS
------------
(Updated March 13, 2024)

Due to the number of operations involved, the processing is divided into 3 R scripts:

chicago_replication.R
toronto_replication.R
assemble_figures.R

We recommend clearing R's global environment prior to running each script. Depending
on the amount of memory your computer has, you may need to restart your R session.
The assemble_figures.R script must be run last. 

Prior installation of the h2o library is required. See:https://docs.h2o.ai/h2o/latest-stable/h2o-docs/downloading.html
Note that restarting your R session will not necessarily shut down the h2o instance. 
To purge the data in the h2o stack, restart your computer or shut down h2o by 
issuing the h2o.shutdown() command in the R console, which will tell you whether 
h2o is running or not. 

The analysis uses legR, an R package written by the authors. Install from:remotes::install_github("davidaarmstrong/legr")

The required source data files are located in the /source/ folder. The scripts create
a series of temporary files in the /temp/ folder. Figures and Tables are exported to 
the /tabsfigs/ folder. The /temp/ and /tabsfigs/ folders must be created prior to 
running the scripts.

Three successful replication runs with associated log files can be found at:
https://www.dropbox.com/scl/fo/t7nlw4ugod5oi74zrhyq8/h?rlkey=h6vzapg91f0s77l0c6cj3bizi&dl=1


IMPLICATIONS OF RANDOMNESS FOR REPLICATION
------------------------------------------
(Added March 13, 2024)

The dimensional reduction algorithm applied to the roll-call votes contains randomness, 
which has implications for replication. The sources of this randomness are as follows.  
First, the raw data are subjected to the generalized low rank model (GLRM) the results 
of which will differ depending on starting conditions.  The results of the GLRM are 
then used as initial estimates of the latent variable to identify the candidate votes 
for each dimension of the analysis.  The votes are then passed into a Bayesian one-
dimensional IRT algorithm, the results of which will differ depending on starting 
conditions.  Further, some of the results – namely the analysis of significant moves – 
depends on draws from the latent variable posteriors. These will also be random and 
thus will change with each fresh execution of the algorithm. Thus, the differences 
in the GLRM stage due to randomness can propagate through the rest of the algorithm.   

We facilitate replication by setting the random number generating seed in all the 
places where it could be set – namely the GLRM and the Bayesian IRT.  Thus, we can 
recreate the numbers that we produce in the manuscript.  Second, the fact that the 
randomness in the GLRM and its effects propagate (i.e., compound) through the 
subsequent model steps means that the results with different seeds can differ.  
These differences across executions of the algorithm may be more than rounding error, 
but should not change the substantive conclusions that we draw from the scores. 
The seed used to generate the data shown in the published paper is 3094.

Due to the randomness described above, running the script may sometimes fail. This 
occurs when a random element of the process produces a NULL value that subsequent 
elements cannot accept. The answer is to restart R and the h2o instance and re-run 
the script. 

See On-Line Appendix E for details and analysis of variability in output using 
simulations.


SOURCE DATA FILES
-----------------varname_lookup.xlsx
Lookup table containing names associated with variable codes.tor_rollcalls.rda
R data object containing roll call data for Toronto.
tor_ward_data.dtaStata data file containing ward profiles for Toronto.tor_ward_councilors.dtaStata data file containing information about Toronto councilors.
chi_rollcalls.rdaR data object containing roll call data for Chicago.chi_ward_councilors.dta
Stata data file containing information about Chicago councilors.
chi_ward_data.dta
Stata data file containing ward profiles for Chicago.WARD SHAPEFILES FOR APPENDIX D, MORAN'S I ANALYSIS:
(Note that Dataverse may zip Esri shapefiles collections. 
These must be unzipped before running the scripts.)ward_shp/tor_wards28.shpward_shp/tor_wards44.shpward_shp/tor_wards25.shpward_shp/chi_ward1970.shpward_shp/chi_ward1981.shpward_shp/chi_ward1986.shpward_shp/chi_ward1992.shpward_shp/chi_ward1998.shpward_shp/chi_ward2002.shpward_shp/chi_ward2012.shp


SESSION INFO
------------
(Updated March 13, 2024)

> sessionInfo()R version 4.3.2 (2023-10-31)Platform: aarch64-apple-darwin20 (64-bit)Running under: macOS Sonoma 14.4Matrix products: defaultBLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0locale:[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8time zone: America/Torontotzcode source: internalattached base packages:[1] stats     graphics  grDevices utils     datasets  methods   base     other attached packages: [1] legR_0.1.0         remotes_2.4.2.1    h2o_3.44.0.3       dwnominate_1.2     wnominate_1.4      ggpubr_0.6.0       [7] ggridges_0.5.6     ggrepel_0.9.5      spdep_1.3-3        spData_2.3.0       sp_2.1-3           sf_1.0-15         [13] emIRT_0.0.13       Rcpp_1.0.12        pscl_1.5.9         progress_1.2.3     scales_1.3.0       cowplot_1.1.3     [19] readstata13_0.10.1 rio_1.0.1          corrr_0.4.4        stringr_1.5.1      tidyr_1.3.1        ggplot2_3.5.0.9000[25] purrr_1.0.2        dplyr_1.1.4        tibble_3.2.1      loaded via a namespace (and not attached): [1] Rdpack_2.6           DBI_1.2.0            bitops_1.0-7         deldir_2.0-2         s2_1.1.6             [6] readxl_1.4.3         rlang_1.1.3          magrittr_2.0.3       e1071_1.7-14         compiler_4.3.2      [11] mgcv_1.9-1           vctrs_0.6.5          pkgconfig_2.0.3      wk_0.9.1             shape_1.4.6         [16] crayon_1.5.2         backports_1.4.1      labeling_0.4.3       utf8_1.2.4           nloptr_2.0.3        [21] bit_4.0.5            glmnet_4.1-8         jomo_2.7-6           logistf_1.26.0       jsonlite_1.8.8      [26] pan_1.9              broom_1.0.5          prettyunits_1.2.0    R6_2.5.1             stringi_1.8.3       [31] car_3.1-2            boot_1.3-28.1        rpart_4.1.23         cellranger_1.1.0     iterators_1.0.14    [36] R.utils_2.12.3       Matrix_1.6-5         splines_4.3.2        nnet_7.3-19          tidyselect_1.2.0    [41] rstudioapi_0.15.0    abind_1.4-5          codetools_0.2-19     curl_5.2.0           lattice_0.22-5      [46] withr_3.0.0          survival_3.5-7       units_0.8-5          proxy_0.4-27         pillar_1.9.0        [51] carData_3.0-5        mice_3.16.0          KernSmooth_2.23-22   foreach_1.5.2        generics_0.1.3      [56] RCurl_1.98-1.14      hms_1.1.3            munsell_0.5.0        minqa_1.2.6          class_7.3-22        [61] glue_1.7.0           tools_4.3.2          data.table_1.15.0    lme4_1.1-35.1        RSpectra_0.16-1     [66] ggsignif_0.6.4       grid_4.3.2           rbibutils_2.2.16     colorspace_2.1-0     nlme_3.1-164        [71] formula.tools_1.7.1  cli_3.6.2            fansi_1.0.6          basicspace_0.24      gtable_0.3.4        [76] R.methodsS3_1.8.2    rstatix_0.7.2        operator.tools_1.6.3 classInt_0.4-10      farver_2.1.1        [81] R.oo_1.26.0          lifecycle_1.0.4      mitml_0.4-5          bit64_4.0.5          MASS_7.3-60.0.1    